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Abstract 

We  present  7tBox,  a  new  application  platform  that  pre¬ 
vents  apps  from  misusing  information  about  their  users. 
To  strike  a  useful  balance  between  users’  privacy  and 
apps’  functional  needs,  7tBox  shifts  much  of  the  respon¬ 
sibility  for  protecting  privacy  from  the  app  and  its  users 
to  the  platform  itself.  To  achieve  this,  7tBox  deploys  (1) 
a  sandbox  that  spans  the  user’s  device  and  the  cloud,  (2) 
specialized  storage  and  communication  channels  that  en¬ 
able  common  app  functionalities,  and  (3)  an  adaptation 
of  recent  theoretical  algorithms  for  differential  privacy 
under  continual  observation.  We  describe  a  prototype  im¬ 
plementation  of  7tBox  and  show  how  it  enables  a  wide 
range  of  useful  apps  with  minimal  performance  overhead 
and  without  sacrificing  user  privacy. 

1  Introduction 

On  mobile  platforms  such  as  iOS  and  Android,  Web 
browsers  such  as  Google  Chrome,  and  even  smart  tele¬ 
visions  such  as  Google  TV  or  Roku,  hundreds  of  thou¬ 
sands  of  software  apps  provide  services  to  users.  Their 
functionality  often  requires  access  to  potentially  sensi¬ 
tive  user  data  (e.g.,  contact  lists,  passwords,  photos),  sen¬ 
sor  inputs  (e.g.,  camera,  microphone,  GPS),  and/or  infor¬ 
mation  about  user  behavior. 

Most  apps  use  this  data  responsibly,  but  there  has  also 
been  evidence  of  privacy  violations  [2,  36,  43,  54,  56], 
Corporations  often  restrict  what  apps  employees  can  in¬ 
stall  on  their  phones  to  prevent  an  untrusted  app — or  a 
cloud  provider  that  an  app  communicates  with — from 
leaking  proprietary  information  [11,  28]. 

There  is  an  inherent  trade-off  between  users’  privacy 
and  apps’  functionality.  An  app  with  no  access  to  user 
data  (e.g.,  one  running  in  Native  Client  [39])  cannot  leak 
anything  sensitive,  but  many  apps  cannot  function  with¬ 
out  such  data.  For  example,  a  password  management  app 
needs  access  to  passwords,  an  audio  transcription  app 
needs  access  to  the  recordings  of  user’s  speech,  etc. 


Existing  confinement  mechanisms  deployed  on  plat¬ 
forms  such  as  iOS  and  Android  rely  on  users  to  explic¬ 
itly  grant  permissions  to  apps.  In  theory,  users  can  de¬ 
cide  how  much  privacy  to  sacrifice  for  functionality.  In 
practice,  permissions  are  very  coarse-grained  (e.g.,  an 
app  that  has  permission  to  access  the  network  can  send 
out  whatever  it  wishes  to  whomever  it  wishes),  and  apps 
often  request  more  permissions  than  they  need  [19,  25] 
and  use  granted  permissions  in  unexpected  ways  (e.g.,  an 
app  with  permission  to  show  the  user’s  location  on  a  map 
may  transmit  this  location  to  other  parties).  Users — who 
are  inundated  with  permission  requests  and  may  not  fully 
understand  the  implications — often  blindly  grant  all  re¬ 
quests  [20]  or  even  disable  notifications  [37],  implicitly 
entrusting  all  apps  with  their  private  data. 

Our  contributions.  This  paper  describes  7rBox,  a  new 
platform  for  confining  untrusted  apps  that  balances  apps’ 
functional  needs  against  their  users’  privacy,  largely  pre¬ 
serving  both.  To  achieve  this  balance,  7tBox  isolates 
each  user’s  instance  of  an  app  from  the  other  instances 
and  users,  and  only  allows  communication  through  a 
few  well-defined  channels  whose  functionality  meets  the 
needs  of  many  apps.  Because  these  channels  are  con¬ 
trolled  by  7tBox,  7tBox  can  give  rigorous  privacy  guar¬ 
antees  about  the  information  that  flows  through  them. 

The  key  idea  behind  7tBox  is  to  shift  much  of  the  re¬ 
sponsibility  for  protecting  user  privacy  from  the  apps  to 
the  platform.  We  use  three  novel  technical  mechanisms: 

1.  A  sandbox  that  spans  a  user’s  device  and  a  cloud 
back-end.  The  latter  may  be  supplied  by  the  device’s 
platform  provider  (e.g.,  Apple  or  Google)  or  another 
entity  (e.g.,  the  user’s  employer). 

2.  Five  specialized  storage  and  communications  sys¬ 
tems  that  enable  a  variety  of  apps  to  do  useful  work 
within  7tBox  while  preserving  user  privacy. 

3.  An  adaptation  and  implementation  of  dijferential 
privacy  under  continual  observation  that  improves 
the  trade-off  between  accuracy  and  privacy  of  re¬ 
leased  statistics  (e.g.,  ad  impression  counts). 
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Platform  provider’s  cloud 


Because  7rBox’s  sandbox  spans  the  device  and  the 
cloud,  7rBox  can  help  enterprises  deploy  bring-your- 
own-app  (BYOA)  policies  that  allow  users  to  execute 
apps  from  untrusted  publishers  on  a  trusted  platform. 
This  platform  may  run  on  the  premises  under  the  en¬ 
terprise’s  direct  control  or  be  part  of  an  external  “app 
store”  or  hosting  infrastructure.  Similar  to  bring-your- 
own-device  (BYOD)  policies,  where  companies  install 
profiles  and  security  software  on  employee-owned  de¬ 
vices  used  for  work,  a  company  might  restrict  apps  to  run 
only  within  7rBox,  thus  ensuring  that  these  apps — and 
any  information  they  access — are  securely  confined. 

This  paper  addresses  three  research  questions  raised 
by  this  architecture.  Can  we  construct  useful  apps  under 
these  constraints?  Can  we  adapt  differentially  private  ag¬ 
gregation  to  an  environment  where  app  providers  need 
to  query  periodically  updated  statistics  of  user  activities? 
Are  the  overheads  of  7tBox  acceptable? 

To  answer  these  questions,  we  constructed  (1)  a  pro¬ 
totype  of  7tBox  and  (2)  a  set  of  sample  apps  that  rep¬ 
resent  common  app  types  and  demonstrate  the  util¬ 
ity  of  our  platform:  a  cloud-backed  password  vault,  an 
ad-supported  news  reader,  and  a  transcription  service. 
We  also  ported  two  open-source  Android  apps:  the  Os- 
mAnd  navigation  app  [41]  and  ServeStream,  an  HTTP- 
streaming  media  player  and  media  server  browser  [51]. 
In  Section  2.5,  we  explain  in  more  detail  the  classes  of 
apps  and  app  features  supported  by  7tBox. 

7tBox  uses  differential  privacy  to  prevent  aggregate 
statistics  from  leaking  too  much  information  about  users 
to  app  publishers.  Conventional  differentially  private 
queries  on  static  datasets  can  be  very  inaccurate  when  the 
input  data  is  changing  due  to  user  behavior.  Instead,  we 
apply  algorithms  for  differential  privacy  under  continual 
observation  [16] — in  particular,  delayed-output  counters. 
We  also  list  the  parameters  that  enable  an  app  publisher 
to  tune  the  amount,  frequency,  and/or  accuracy  of  the  re¬ 
ported  statistics  subject  to  the  platform’s  bound  on  the 
rate  of  information  leakage.  The  resulting  relative  error 
rates  on  real-world  traces  are  five  times  lower  than  with 
conventional  differentially  private  counters. 

The  paper  proceeds  as  follows.  Section  2  presents  an 
overview  of  7tBox’s  design.  Section  3  shows  how  7tBox 
deploys  differential  privacy  under  continual  observation 
and  privacy-preserving  top-A'  lists  to  implement  aggre¬ 
gate  channels.  Section  4  describes  our  prototype  imple¬ 
mentation.  Section  5  evaluates  it  and  describes  the  apps 
we  developed  or  ported  to  7rBox.  Section  6  discusses  re¬ 
lated  work.  Section  7  concludes. 

2  Design 

7tBox  is  a  platform  for  executing  apps  and  associated 
remote  services.  There  are  three  types  of  principals  in- 
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FIGURE  1 — Architecture  of  7rBox. 


volved  in  7tBox:  (1)  the  platform  provider  who  sup¬ 
plies  the  client  (either  software,  e.g.,  Google  Chrome,  or 
both  hardware  and  software,  e.g.,  Apple  iPhone,  Google 
Nexus  7,  or  Kindle  Fire),  as  well  as  the  cloud  resources 
on  which  app  instances  execute,  and  deploys  7rBox  on 
both  the  client  and  the  cloud;  (2)  users  who  invoke  and 
use  untrusted  apps  on  their  local  devices  and  their  slice  of 
the  cloud;  and  (3)  publishers  who  provide  apps,  content 
for  apps,  and/or  advertisements. 

2.1  Threat  model 

7tBox  is  based  on  the  following  design  philosophy:  do 
not  trust  the  apps  nor  rely  on  the  users  to  make  fine¬ 
grained  privacy  decisions;  instead,  trust  the  platform 
to  enforce  privacy.  We  argue  that  trusting  the  platform 
provider  is  far  more  reasonable  than  expecting  users  to 
judge  the  trustworthiness  of  many  different,  often  ob¬ 
scure  app  publishers.  After  all,  users  must  already  trust 
the  platform  provider  to  not  leak  their  private  data.  Fur¬ 
thermore,  third-party  platform  providers  are  often  trusted 
brands  such  as  Google,  Apple,  and  Amazon  that  have 
strong  incentives  to  take  care  of  their  customers’  data. 
Therefore,  we  assume  that  both  users  and  app  publishers 
trust  the  platform,  but  users  do  not  trust  the  publishers. 
Furthermore,  we  neither  assume  that  the  provider  trusts 
the  publishers,  nor  rely  on  auditing  by  the  provider  to 
eliminate  misbehaving  apps.1 

7tBox  is  thus  designed  for  the  scenario  where  an  un¬ 
trusted  app  runs  in  a  trusted  sandbox.  In  this  model,  the 
app’s  publisher  may  be  malicious,  the  code  of  the  app 
may  attempt  to  leak  users’  private  data  or  reveal  infor¬ 
mation  about  its  users  to  the  publisher,  some  of  the  app’s 
users  may  be  colluding  with  the  app  in  an  attempt  to  learn 
other  users’  data,  etc.  That  said,  the  attacker  is  subject  to 

'Platforms  that  do  audit  apps  such  as  Google  Play  provide  addi¬ 
tional  assurance  that  is  complementary  to  what  7tBox  provides. 
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standard  computational  feasibility  constraints  (e.g.,  the 
attacker  cannot  subvert  cryptographic  primitives). 

The  sandbox  provided  by  7tBox  is  assumed  to  be 
trusted.  This  includes  both  the  components  running  on 
the  client  device  and  those  running  in  the  cloud.  Like  any 
software,  if  7tBox  is  implemented  incorrectly,  it  may  be 
subject  to  code  injection  and  other  attacks  that  compro¬ 
mise  the  “ideal  sandbox”  abstraction.  These  attacks  are 
outside  the  scope  of  this  paper,  which  focuses  primarily 
on  the  design  of  the  sandbox.  Another  way  in  which  the 
“ideal  sandbox”  abstraction  may  be  violated  is  via  covert 
(e.g.,  timing)  channels  between  processes  running  in  the 
sandbox  and  those  outside  the  sandbox  [33,  47].  If  an 
implementation  of  7rBox  is  vulnerable  to  such  channels, 
apps  may  be  able  to  exfiltrate  private  data. 

There  has  been  much  research  on  sandboxing  mecha¬ 
nisms  (e.g.,  [27,  31,  60],  among  others).  This  work  is  or¬ 
thogonal  and  complementary  to  the  design  of  7rBox  and 
can  be  applied  to  any  implementation  thereof. 

2.2  Extended  sandbox 

Apps  in  7tBox  have  two  halves:  one  runs  locally  on  the 
user’s  device,  the  other  (optional)  runs  remotely  in  the 
cloud.  7tBox,  executing  as  the  platform  both  on  the  de¬ 
vice  and  in  the  cloud,2  supplies  a  per-user,  per-app  sand¬ 
box  that  spans  the  device  and  the  cloud.  In  effect,  7tBox 
provides  the  abstraction  that  a  slice  of  the  cloud  is  part 
of  the  user’s  device:  all  of  the  app’s  computations  and 
storage  are  done  within  this  “distributed”  device,  which 
is  otherwise  isolated  to  protect  the  user’s  privacy. 

The  local  half  of  an  app  running  on  the  user’s  device 
can  only  connect  to  the  remote  half  associated  with  the 
same  app  and  user.  The  local  half  does  so  by  making  a 
request  to  the  authentication  service  running  as  part  of 
the  platform  on  the  device.  This  service  sends  the  user’s 
credentials  and  the  app’s  ID  to  the  authentication  man¬ 
ager  running  as  a  part  of  the  platform  in  the  cloud  (see 
Figure  1).  Upon  successful  authentication,  the  authenti¬ 
cation  manager  starts  up  the  requesting  app’s  remote  half 
for  that  specific  user  and  opens  a  secure  channel  between 
the  local  and  remote  halves. 

2.3  Storage  and  communication 

An  app  running  within  7tBox  cannot  write  data  or  es¬ 
tablish  network  connections  outside  of  the  sandbox.  To 
support  app  functionality,  7tBox  provides  five  restricted 
storage  and  communication  channels  (see  Table  1). 

The  private  vault  provides  per-sandbox  (i.e.,  per-user, 
per-app)  storage  that  lets  an  app  instance  store  data  spe¬ 
cific  to  a  particular  user  (e.g.,  user  profile,  location,  query 

-Apple  (iOS/iCloud)  and  Google  (Android/Cloud  Services)  already 
provide  app  platforms  that  extend  from  users’  devices  to  the  cloud. 
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Shared  channels  for  all  users  of  an  app 
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App 

Store  app 

data 

storage 

and  content 

Aggregate 

channel 

App 

Publisher 

Collect 

statistics 

usage 
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App 

App 
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Publisher 

App 

Receive  shared 
content,  noti¬ 
fications  from 

publisher 

Sharing 

channel 

App 

App  (via 
inbox) 

Share  content 

TABLE  1 — Channels  in  7rBox. 


history,  etc.)  in  order  to  provide  personalized  services. 
For  example,  a  password  app  may  use  the  vault  to  store 
the  user’s  passwords,  while  a  news  reader  app  may  store 
keywords  of  the  articles  the  user  has  read.  Each  sand¬ 
boxed  app  instance  has  read/write  access  to  its  own  pri¬ 
vate  vault;  no  one  else  has  any  access  rights. 

The  content  storage  provides  per-publisher  storage  for 
the  content  that  app  instances  need  to  function,  e.g.,  maps 
for  a  navigation  app.  Each  publisher  has  read/write  ac¬ 
cess  to  its  own  content  storage  so  that  the  publisher  can 
(1)  update  the  content  and  (2)  grant  read-only  access  to 
apps  that  need  this  content.  Apps  may  draw  content  from 
multiple  publishers’  content  storage.  For  example,  an  ad- 
supported  news  reader  may  load  news  articles  from  a 
news  publisher’s  storage  and  ads  from  an  ad  broker’s 
storage.  Although  content  storage  is  shared  across  all 
sandboxes  that  have  access  to  it,  read-only  access  pre¬ 
vents  communication  between  app  instances. 

The  aggregate  channel  provides  a  per-app  channel 
(shared  among  all  instances  of  an  app)  for  publishers 
to  collect  statistics  on  users’  collective  behavior  while 
protecting  privacy  of  individual  users.  For  example,  pub¬ 
lishers  of  advertising-supported  apps  may  collect  the  to¬ 
tal  number  of  ad  impressions,  but  not  which  user  viewed 
which  ad.  Similarly,  publishers  of  news  or  video  stream¬ 
ing  apps  may  learn  which  articles  or  videos  are  popular, 
but  not  who  viewed  what  content.  Publishers  have  read 
access  to  their  respective  aggregate  channels,  and  each 
app  has  write  access  to  its  channel.  In  Section  3,  we  de¬ 
scribe  how  7tBox  employs  differential  privacy  to  protect 
data  released  via  this  channel. 

The  inbox  provides  per-sandbox  storage  for  the  user  of 
a  particular  app  instance  to  receive  information  from  the 
app’s  publisher  as  well  as  the  content,  if  any,  shared  by 
other  users  of  the  same  app.  Each  sandbox  has  read/write 
access  to  its  inbox.  All  writes  from  the  publisher  or  other 
users  must  go  through  7tBox;  when  publishers  want  to 
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communicate  with  their  apps’  users,  they  submit  mes¬ 
sages  with  the  user  as  the  recipient,  and  7tBox  delivers 
the  message  to  the  appropriate  inbox. 

Finally,  the  sharing  channel  provides  a  per-sandbox 
method  for  sharing  content  with  other  users  of  the  same 
app.  To  ensure  that  all  recipients  of  the  shared  content 
are  explicitly  approved  by  the  user,  we  rely  on  a  trusted, 
platform-controlled  dialog  box  (similar  to  a  “powerbox,” 
which  is  traditionally  used  to  restrict  the  paths  an  app 
can  access  [34,  50]).  When  a  user  wants  to  share  content 
from  an  app,  the  app  writes  the  data  to  be  shared  into  its 
own  sharing  channel  (to  which  no  other  sandbox  has  ac¬ 
cess)  and  notifies  the  platform.  7tBox  controls  the  rest  of 
the  sharing  process:  it  (1)  reads  in  the  data,  (2)  presents 
the  data  to  the  user  in  a  dialog  box  that  explicitly  noti¬ 
fies  the  user  about  the  imminent  sharing  of  the  presented 
data,  (3)  prompts  the  user  to  confirm  the  recipients,  and, 
upon  confirmation,  (4)  writes  the  shared  content  to  the 
inboxes  of  the  designated  recipients’  sandboxes.  This  de¬ 
sign  ensures  that  users  are  aware  when  and  with  whom 
sharing  occurs,  but  it  cannot  prevent  the  app  from  surrep¬ 
titiously  leaking  private  information  in  the  shared  data 
(e.g.,  through  steganography). 

2.4  Advertising  and  third-party  services 

Advertising.  To  broadly  support  free  apps,  many  of 
which  are  financed  by  ads,  7rBox  must  support  in-app 
advertising.  Traditionally,  advertisers  tell  ad  networks 
which  ads  to  display,  how  much  they  are  willing  to  pay 
per  impression,  and  the  interests  they  are  targeting.  Ad 
networks  organize  ads  into  lists  ranked  by  factors  such  as 
the  bid,  number  of  impressions  already  made,  etc.  When 
an  app  wants  to  display  an  ad,  the  ad  network  provides 
an  ad  based  on  the  user’s  perceived  interests. 

To  prevent  apps  from  leaking  users’  private  data  to  ad¬ 
vertisers,  7tBox  changes  this  process:  (1)  the  ad  network 
must  store  its  ads  in  content  storage  on  the  7tBox  cloud 
platform,  (2)  the  number  of  impressions  must  be  released 
via  the  aggregate  channel  (see  Section  3.1),  and  (3)  the 
logic  for  selecting  and  fetching  an  ad  from  content  stor¬ 
age  (based  on  the  user’s  profile,  activities,  etc.)  and  the 
logic  for  outputting  to  the  aggregate  channel  must  be  im¬ 
plemented  inside  the  app  (e.g.,  as  part  of  a  SDK  or  li¬ 
brary)  and  executed  inside  the  sandbox.  For  efficiency, 
7tBox  allows  publishers  to  share  content  storage  across 
multiple  apps.  Since  apps  have  read-only  access,  this 
does  not  affect  privacy  guarantees. 

7tBox  protects  users’  identities  and  thus  prevents  ad 
networks  from  singling  out  individuals  who  may  be  en¬ 
gaged  in  ad  impression/click  fraud.  That  said,  other  de¬ 
fenses  [22] — per-user  thresholds  on  the  number  of  im- 
pressions/clicks,  bait  ads,  and  using  historical  statis¬ 
tics  to  detect  apps  that  pad  the  number  of  impres¬ 


sions/clicks — continue  to  be  effective  even  with  7rBox. 

Ads  that  click-through  to  external  sites  can  leak  a 
user’s  identity  (or  at  least  the  IP  address)  and  other  pri¬ 
vate  information.3  In  7rBox,  arbitrary  network  traffic  out 
of  the  sandbox  is  not  allowed,  and  click-through  ads  must 
redirect  the  user  to  trusted  platform  resources,  e.g.,  an  ad 
page  in  the  ad  network’s  content  storage. 

Although  not  yet  implemented,  conventional  click¬ 
through  ads  can  be  supported  in  7rBox  with  some  mod¬ 
ifications.  First,  all  click-through  URLs  must  be  pre¬ 
specified  and  static  for  all  app  instances  (they  cannot  be 
dynamically  generated  or  otherwise  based  on  the  infor¬ 
mation  observed  by  a  given  instance).  This  still  allows 
a  potential  leak  because  the  app’s  choice  of  predefined 
ads  to  show  to  the  user  may  depend  on  the  user’s  private 
information,  but  requiring  static  URLs  limits  the  rate  of 
leakage.  Second,  the  platform  must  verify  that  the  click 
indeed  originated  from  the  user.  To  support  this,  7tBox 
can  use  a  trusted  powerbox  dialog  to  prompt  the  user 
for  explicit  consent,  similar  to  the  sharing  channel,  be¬ 
fore  permitting  the  click  to  go  through.  We  believe,  how¬ 
ever,  that  this  point  in  the  design  space  for  ad  support 
sacrifices  privacy,  complicates  the  guarantees  provided 
by  7tBox,  and  forces  users  to  make  privacy  decisions  for 
which  they  may  not  fully  understand  the  implications. 

7tBox  does  not  currently  support  ad  networks  that 
choose  which  ads  to  serve  via  a  real-time  auction.  Such 
auctions  require  either  that  users’  profiles  be  sent  to  the 
advertisers  (so  they  know  what  they  are  bidding  on), 
or  that  all  bidding  logic  be  part  of  the  sandbox.  Alter¬ 
natively,  there  exist  proposals  for  privacy-preserving  ad 
auctions  [46].  Advertising  based  on  real-time  bidding  ac¬ 
counts  for  less  than  30%  of  all  advertising  sales  [45],  and 
the  introduction  of  “Do  Not  Track”  in  Web  browsers  may 
adversely  impact  auction-based  advertising  [17], 
Third-party  services.  Because  7tBox  does  not  allow 
apps  to  communicate  outside  of  the  platform,  apps  can¬ 
not  use  external  third-party  services  such  as  content  de¬ 
livery  networks  (CDNs).  As  with  ads,  apps  running  on 
7tBox  can  only  access  content  and  use  services  that  are 
hosted  by  the  platform  provider  and  published  in  the 
read-only  content  storage.  Fortunately,  many  platform 
providers  already  provide  services  for  apps,  e.g.,  maps 
from  Apple,  Google,  and  Bing,  or  CDN  services  such  as 
Amazon  CloudFront  and  Google  PageSpeed. 

2.5  Apps  supported  by  7tBox 

Figure  2  lists  many  app  features  and  indicates  whether 
and  how  7tBox  protects  user  privacy  for  each  of  them. 
In  general,  apps  that  do  not  involve  sharing  between 

3  For  example,  a  set  of  ads  may  only  be  shown  to  (and  thus  clicked 
by)  users  matching  certain  criteria  or  even  maliciously  micro-targeted 
to  specific  individuals  [30], 
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FIGURE  2 — 7tBox  support  for  different  app  features. 


users  are  well-suited  for  7tBox.  This  includes,  for  ex¬ 
ample,  multimedia,  reference,  weather,  and  utility  apps, 
many  of  which  handle  sensitive  data  (e.g.,  navigation, 
personal  finance,  password  management,  malware  detec¬ 
tion,  speech  recognition,  etc.).  7rBox  supports  the  report¬ 
ing  of  usage  statistics,  user  feedback,  and  ad  impressions. 

Some  apps  only  share  content  occasionally,  e.g., 
games  that  let  users  share  their  scores,  or  camera  apps 
that  let  users  share  some  of  their  pictures.  For  these  apps, 
7tBox  protects  user  privacy  with  respect  to  the  app’s  core 
functionality.  Furthermore,  the  7tBox  sharing  channel  en¬ 
sures  that  any  content  sharing  is  explicitly  authorized  by 
the  user  (malicious  apps  may  still  exfiltrate  sensitive  data 
by  hiding  it  in  shared  content — see  Section  2.6). 

Finally,  there  are  apps — e.g.,  Facebook,  Twitter,  or 
multiplayer  online  games — whose  sole  purpose  is  to 
allow  users  to  connect,  communicate,  collaborate,  and 
share  content  with  other  users.  Users  of  such  apps  al¬ 
ready  expect  to  lose  some  of  their  privacy,  and  7rBox  can 
guarantee  relatively  little  for  them. 

Each  7rBox-supported  app  is  assigned  a  privacy  rat¬ 
ing  determined  by  the  channels  it  uses.  Apps  that  only 
use  the  private  vault,  content  storage,  or  inbox  are  green: 
they  never  export  any  data  from  the  sandbox  and  cannot 
leak  anything.  Apps  that  use  the  aggregate  channel  are 
yellow:  they  may  release  differentially  private  statistics 
but  there  is  a  provable  bound  on  the  amount  of  informa¬ 
tion  leaked.  Finally,  apps  that  use  the  sharing  channel  are 
red:  they  rely  on  explicit  user  consent  to  export  infor¬ 
mation  and  are  at  a  higher  risk  of  leaking  private  data. 
In  Section  5.4,  we  describe  how  many  top  apps  from  the 
Google  Play  store  fall  into  these  categories. 


2.6  Limitations  and  scope 

7tBox  reduces  privacy  risks  to  the  users  of  many  apps 
and  makes  it  more  difficult  to  harvest  large  amounts  of 
private  user  information,  but  it  is  not  a  privacy  panacea. 

First,  the  differentially  private  aggregate  channel  leaks 
a  little  information  with  every  output.  This  is  inevitable, 
and  we  quantify  this  leakage  in  Section  3.  Note  that 
no  covert  communication  beyond  this  leakage  is  possi¬ 
ble  over  the  aggregate  channel  because  differential  pri¬ 
vacy  holds  regardless  of  the  recipient’s  auxiliary  (includ¬ 
ing  covert)  information.  In  the  case  of  7tBox’s  aggregate 


channel,  the  timing  of  the  release  is  differentially  private, 
too,  precluding  a  malicious  app  from  encoding  covert  in¬ 
formation  in  the  timing  of  its  aggregate  outputs. 

Second,  while  7tBox’s  sharing  channel  guarantees  that 
only  the  specified  recipient  can  read  the  shared  content,  a 
malicious  app  may  hide  private  information  in  this  con¬ 
tent  via  steganography.  Several  factors  mitigate  this  risk. 
First,  7tBox  shows  the  content  to  be  shared  to  the  user 
and  uses  the  powerbox  mechanism  to  directly  confirm 
the  user’s  consent  to  share.  Second,  7tBox  restricts  the 
type  of  content  to  be  shared:  only  plain  text  and  images 
are  allowed  in  our  prototype. 

This  is  a  trade-off  between  usability  and  privacy.  The 
design  philosophy  behind  7tBox  is  to  avoid  involving 
users  in  privacy-critical  decisions  (in  contrast  to  the  An¬ 
droid  permission  system).  At  the  same  time,  sharing  is 
important  for  many  applications,  and  7tBox  lets  users  ex¬ 
plicitly  accept  a  privacy  risk  when  sharing  content. 

Most  importantly,  7rBox  guarantees  that  shared  con¬ 
tent  can  only  be  viewed  by  the  recipients  who  have  been 
explicitly  approved  by  the  user.  While  a  malicious  app  in¬ 
stance  may  be  able  to  embarrass  the  user  by  sending  pri¬ 
vate  information  to  an  approved  recipient,  the  app  pub¬ 
lisher  still  does  not  have  access  to  this  data  unless  the 
recipient  (or  the  user  who  is  sharing)  cooperates. 

In  general,  we  believe  that  7tBox  will  be  appealing  to 
entities  looking  to  (1)  enhance  or  safeguard  their  exist¬ 
ing  app  platforms  by  improving  user  privacy,  (2)  rent 
privacy-preserving  cloud  resources  to  app  publishers, 
and/or  (3)  provide  a  curated  version  of  their  standard  app 
store  that  offers  privacy-enhanced  apps  to  enterprise  cus¬ 
tomers.  7tBox  is  an  especially  good  fit  for  enterprise  en¬ 
vironments,  where  apps  typically  contain  content  from 
a  single  external  publisher,  do  not  require  (in  fact,  fre¬ 
quently  forbid)  sharing  of  content  outside  the  enterprise, 
do  not  rely  on  ads,  and  do  not  involve  functionalities  with 
multiple  external  parties  such  as  brokered  ad  auctions. 


3  Protecting  privacy 

The  functionality  of  many  apps  depends,  both  technically 
and  financially,  on  some  information  about  their  users. 
Aggregate  statistics  are  often  sufficient — for  example, 
some  ad-supported  apps  only  need  to  track  the  number 
of  ad  impressions,  not  whether  a  particular  user  viewed 
a  given  ad — but  even  they  may  reveal  information  about 
individuals  [8,  14]. 

7tBox  uses  differential  privacy  [14]  to  enable  app  pub¬ 
lishers  to  collect  relatively  accurate  statistics  on  users’ 
behavior  while  limiting  information  leaks  about  any  in¬ 
dividual  user.  Informally,  differential  privacy  is  a  frame¬ 
work  for  designing  computations  where  the  influence  of 
any  single  input  on  the  output  is  bounded,  regardless 
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of  the  adversary’s  knowledge  and/or  external  (auxiliary) 
sources  of  information  the  adversary  may  have  access  to. 

“Conventional"  differential  privacy  techniques  such  as 
the  Laplacian  mechanism  (described  in  the  following 
section)  are  primarily  intended  to  protect  individual  in¬ 
puts  in  computations  on  static  datasets.  By  contrast,  apps 
keep  generating  new  data:  for  example,  an  app  may  con¬ 
tinuously  update  the  number  of  times  a  news  article  has 
been  read  or  an  ad  has  been  shown.  Moreover,  app  pub¬ 
lishers  may  be  interested  in  rankings,  such  as  the  most 
popular  news  articles  or  the  most  frequently  misrecog- 
nized  words  in  a  transcription  app.  As  we  will  show,  con¬ 
ventional  mechanisms,  while  privacy-preserving,  result 
in  an  unacceptable  loss  of  accuracy  in  these  settings. 

To  balance  privacy  and  accuracy,  7tBox  deploys  re¬ 
cently  developed  algorithms  for  differentially  private 
counters  under  continual  observation  [16]  and  differen¬ 
tially  private  ranked  lists  [7].  To  the  best  of  our  knowl¬ 
edge,  7tBox  is  the  first  system  that  uses  differential  pri¬ 
vacy  under  continual  observation  in  a  working  system. 

3.1  Counters  and  top- A  lists 

In  7tBox,  the  key  building  block  for  the  aggregate  chan¬ 
nel  is  a  set  of  platform-controlled  counters.  As  an  app 
executes,  it  may  increment  one  or  more  counters.  Even¬ 
tually,  the  (randomly  perturbed)  values  of  these  counters 
are  released  to  the  app  publisher.  The  list  of  counters 
must  be  defined  by  the  publisher  in  advance.  Therefore, 
a  malicious  app  instance  cannot  encode  user-specific  in¬ 
formation  in  its  choice  of  counter  names.  The  released 
counter  values  are  differentially  private  and  thus  proba¬ 
bilistically  hide  the  influence  of  any  given  user’s  data. 

7tBox  enforces  user-level  differential  privacy  on  these 
counters,  i.e.,  the  privacy  of  all  data,  actions,  and  any 
other  inputs  associated  with  a  particular  user,  as  opposed 
to  the  privacy  of  a  single  input.  Formally,  for  some  pri¬ 
vacy  parameter  e  (described  further  in  Section  3.2),  a 
computation  F  satisfies  user-level  e-differential  privacy 
if,  (1)  for  all  input  datasets  D  and  D'  that  differ  only  in  a 
single  individual  user  whose  inputs  are  present  in  D  but 
not  in  D' .  and  (2)  all  outputs  S  C  Range(F), 

Pr [F(D)  €S]<ee-  Pr [F(D')  G  S]  (1) 

A  standard  mechanism  for  making  any  computation  F 
differentially  private  is  the  Laplacian  mechanism,  which 
adds  random  noise  from  a  Laplace  distribution  to  the  out¬ 
put  of  F  before  it  is  released,  i.e.,  F{x)  +  Lap  (  — ). 
Here  Lap(y)  is  a  Laplace-distributed  random  variable 
with  mean  0  and  scale  y,  and  A F  is  the  maximum  pos¬ 
sible  change  in  the  value  of  F  (F’s  sensitivity)  when  a 
single  user’s  inputs  are  removed  from  the  dataset. 

Intuitively,  the  more  sensitive  a  computation  is  to  its 
inputs,  the  more  random  noise  is  needed  to  ensure  a 


Parameter  chosen  by  platform  provider 

Per-period  privacy  budget  (R) 

Parameters  chosen  by  app  publisher 

List  of  counters  (L) 

Frequency  of  output  release  (/) 

Privacy  parameter  (e) 

Max.  #  counters  app  instance  can  update  per  period  ( n ) 
Max.  contribution  to  each  counter  per  period  (s) 

Buffer  size  ( h ) 

#  of  ranked  counters  (K) 

TABLE  2 — Parameters  for  aggregate  counters,  b  and  K  only 
apply  to  delayed-output  and  top- A'  counters,  respectively. 

given  level  of  privacy.  Consequently,  A F  in  7rBox — and, 
therefore,  the  amount  of  noise  that  7tBox  adds  to  the  re¬ 
leased  counter  values — depends  on  the  number  of  coun¬ 
ters  a  user  can  update  (which  we  denote  as  n )  and  the 
maximum  amount  by  which  a  user  can  affect  any  sin¬ 
gle  counter  (s).  There  is  an  important  trade-off  in  the 
Laplacian  mechanism  between  privacy  (e)  and  accuracy: 
higher  accuracy  requires  giving  up  more  privacy.  We 
will  revisit  this  trade-off  in  detail  in  Section  3.2. 
Supporting  periodic  updates.  Many  apps  dynami¬ 
cally  update  counters  during  execution  and  then  need  to 
periodically  release  them.  The  Laplacian  mechanism  can 
be  applied  to  every  release,  but  if  the  timing  of  releases 
is  independent  of  the  counter’s  true  value,  the  random 
noise  added  by  the  mechanism  (which,  too,  is  indepen¬ 
dent  of  the  counter’s  value)  can  be  much  larger  than  the 
true  value,  resulting  in  high  relative  error.  This  arises,  for 
instance,  when  counting  the  number  of  impressions  for 
rarely  displayed  ads  targeting  a  niche  group  of  users. 

7tBox  uses  delayed-output  counters  [16]  instead.  Fig¬ 
ure  3  describes  how  such  a  counter  is  implemented.  Intu¬ 
itively,  this  mechanism  randomly  delays  releases  of  the 
counter  value;  if  the  value  is  small  relative  to  the  noise 
that  must  be  added,  the  release  is  likely  to  be  postponed. 

Furthermore,  rather  than  allowing  counters  to  be  con¬ 
tinuously  queried,  7tBox  enforces  a  minimum  interval 
between  releases  (line  5).  Thus,  even  the  counters  that 
have  internally  accumulated  a  large  number  of  updates 
may  not  be  immediately  released.  Delaying  the  release 
may  affect  the  freshness  of  the  released  values,  but  the 
relative  error  will  be  smaller. 

Supporting  ranked  top- A  lists.  To  release  top-A' 
lists,  7tBox  adapts  techniques  by  Bhaskar  et  al.  [7].  The 
app  publisher  specifies  K  beforehand,  and  the  amount  of 
noise  that  is  added  is  proportional  to  K,  which  is  typi¬ 
cally  smaller  than  the  amount  of  noise  (proportional  to 
n)  that  would  have  been  added  if  we  had  used  the  Lapla¬ 
cian  mechanism  on  every  counter  to  determine  the  top 
K.  To  generate  a  ranking  of  the  counters  without  their 
associated  values,  the  algorithm  adds  Lap(AK s / e)  ran¬ 
dom  noise  to  the  values  of  all  counters  and  picks  the 
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FIGURE  3 — Delayed-output  counter. 


top  K  counters  based  on  these  noisy  values.  If  an  app 
publisher  needs  to  know  the  actual  values  of  the  associ¬ 
ated  counters  as  well,  the  algorithm  adds  an  additional 
Lap(2K  s  /  e)  noise  to  the  true  values  of  the  selected  K 
counters  before  releasing  their  values. 

It  may  appear  that  the  ability  to  release  top- A  lists  al¬ 
lows  apps  to  leak  sensitive  information.  For  example,  the 
publisher  of  a  password  management  app  could  learn  the 
K  most  common  user  passwords  (in  any  case,  these  are 
already  well-known).  Note,  however,  that  the  publisher 
cannot  learn  the  password  of  any  given  user.  Similarly, 
conventional  differential  privacy  allows  the  publisher  to 
ask  how  many  users  have  a  particular  password,  but  the 
answer  does  not  reveal  any  specific  user’s  password. 

Finally,  7tBox’s  aggregate  channel  can  be  extended 
to  support  other  differentially  private  functions  such  as 
mean  and  threshold  [48]. 

3.2  Choosing  privacy  parameters 

Absolute  privacy  cannot  be  achieved:  as  long  as  the  re¬ 
leased  values  have  any  utility,  the  original  data  can  be 
reconstructed  after  observing  at  most  a  linear  (in  the  size 
of  the  dataset)  number  of  values  [13].  To  model  the  cu¬ 
mulative  loss  of  privacy  after  multiple  computations  on 
the  same  private  data,  differential  privacy  uses  the  notion 
of  a  privacy  budget  [15,  35],  Every  e-private  computa¬ 
tion  charges  e  cost  to  this  budget.  The  higher  the  value  of 
e,  the  less  noise  is  added,  thus  the  released  value  is  more 
accurate,  but  the  privacy  cost  is  correspondingly  higher, 
too.  The  budget  is  pre-defined  by  the  data  owner.  Once  it 
is  exhausted,  no  further  release  is  allowed. 

In  our  setting,  it  is  undesirable  for  an  app  to  lose  func¬ 
tionality  after  a  while.  Instead,  7tBox  enforces  a  per- 
period  privacy  budget  that  bounds  privacy  loss  per  pe¬ 
riod  by  parameter  R,  which  is  chosen  by  the  platform 
provider.  For  a  given  R ,  the  app  publisher  may  specify 
the  types  of  the  counters  the  app  will  release  (delayed- 
output  and/or  top-Jv  with  or  without  associated  values). 


as  well  as  the  relevant  parameters  in  Table  2,  so  long  as 

c  -  f  <  R  (2) 

where  c  =  e/2  for  top -K  counters  without  associated 
values  and  e  for  the  other  two  types  of  counters. 

To  understand  how  c  and  e  relate  to  the  amount  of  in¬ 
formation  leaked,  let  P  be  an  adversary’s  prior  proba¬ 
bility  of  the  user’s  private  data  having  a  particular  value 
and  P'  be  the  posterior  probability  after  observing  the  re¬ 
leased  counters.  Condition  (1)  ensures  that  P'  <  ec  ■  P , 
i.e.,  any  released  value  changes  the  adversary’s  prior 
probabilities  (no  matter  what  they  are!)  by  no  more  than  a 
constant  multiplicative  factor.  If  uncertainty  is  measured 
as  min-entropy  of  the  adversary’s  probability  distribution 
over  the  private  data,4  every  release  yields  (c  log2  e)  bits 
of  information  to  the  adversary  [1,  6],  Given  this  repre¬ 
sentation  of  uncertainty,  7rBox’s  counters  release  at  most 
(/  •  clog2  e)  =  (f?log2  e)  bits  per  period.  For  example, 
an  app  that  uses  delayed-output  counters  with  e  =  1  and 
the  release  frequency  /  of  once  per  day  leaks  at  most 
1 .44  bits  of  information  daily. 

While  it  is  straightforward  to  calculate  how  much 
noise  should  be  added  for  a  given  choice  of  counter  type 
and  e,  the  utility  of  a  particular  counter  arguably  depends 
not  just  on  the  amount  of  noise  added,  but  also  the  ac¬ 
tual  true  counter  value,  i.e.,  the  relative  amount  of  noise 
matters.  The  larger  the  true  value,  the  larger  the  absolute 
noise  that  can  be  tolerated  for  a  given  relative  error,  thus 
allowing  for  smaller  values  of  e. 

As  long  as  condition  (2)  is  followed,  app  publishers 
are  free  to  choose  the  types  of  the  counters  used  by  their 
apps  and  the  values  of  the  parameters  listed  in  Table  2. 
For  example,  a  publisher  may  want  more  frequent  output 
(/),  at  the  expense  of  lower  e,  higher  A  and  thus  lower  ac¬ 
curacy.  To  maintain  the  same  accuracy,  the  publisher  may 
keep  the  same  A  at  the  cost  of  decreasing  the  maximum 
number  of  counters  a  single  app  instance  can  update  (n) 
and/or  the  maximum  amount  it  can  contribute  (s). 

4  Implementation 

We  implemented  a  prototype  of  7tBox  using  Android  2.3 
(Gingerbread)  for  the  device  client;  Jetty  [29],  a  Java 
servlet  container,  for  the  remote  services;  and  HBase  [23] 
for  the  cloud  communication  and  storage  channels.  The 
trusted  computing  base  (TCB)  consists  of  the  above  soft¬ 
ware,  cloud  operating  system  (Linux  in  our  case),  and 
the  7tBox  implementation,  which  itself  is  approximately 
7,500  lines  of  code  for  the  cloud  half  and  2,700  for  the 
device  half.  The  design  of  7rBox  is  largely  agnostic  to 
the  specific  sandboxing  technology  and  could  have  used 

4The  min-entropy  of  a  probability  distribution  that  assigns  proba¬ 
bility  pf  to  some  event  i  is  -  (max;  log2 (/>,  )). 
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virtual  machines.  Native  Client  [39],  or  more  advanced 
sandboxes,  which  would  change  the  size  of  the  TCB. 

4.1  Isolation  and  authentication 

Client  isolation.  To  implement  the  sandbox  on  the 
device,  we  augmented  Android’s  built-in  sandboxing 
mechanism.  By  default.  Android  assigns  each  app  a 
unique  user  identifier  (UID).  7rBox  allows  non-privacy- 
preserving  apps  to  coexist  with  privacy-preserving  apps 
on  the  same  device,  but  assigns  UIDs  from  different 
ranges  to  apps  of  different  types.  This  makes  isolation 
enforcement  simpler  in  the  kernel  code. 

Android  uses  standard  Linux  permissions  to  isolate 
apps  from  each  other,  but  this  is  not  enough  to  prevent 
an  app  from  abusing  the  permissions  it  has.  To  prevent 
7rBox-confined  apps  from  leaking  private  data,  we  mod¬ 
ify  Android  to  block  them  from  creating  world-readable 
files  or  directories,  and  from  writing  to  files  or  directo¬ 
ries  owned  by  another  app’s  UID.5  7tBox  does  not  allow 
confined  apps  to  communicate  with  other  non-system 
apps  via  IPC,  including  Binder  IPC  (the  basic  primitive 
for  various  higher-level  Android  IPC  mechanisms).  Fi¬ 
nally,  we  use  iptables  to  confine  the  apps’  network  traf¬ 
fic.  These  changes  are  applied  at  the  kernel  level  only  to 
7rBox-confined  apps  (recognized  by  their  UIDs). 

Cloud  isolation.  We  implement  the  server-side  func¬ 
tionality  for  7tBox  apps  as  Java  servlets  using  Jetty.  Many 
existing  Web  apps,  e.g.,  those  on  Google  App  Engine  [4], 
can  thus  be  easily  adapted  to  7tBox. 

In  Jetty,  each  app  is  isolated  in  a  separate  Web  app  con¬ 
text  (a  container  that  shares  the  same  Java  class  loader). 
In  7tBox,  each  user  of  an  app  is  also  isolated  in  a  sepa¬ 
rate  context,  achieving  classloader-level  isolation.  To  re¬ 
strict  the  servlet’s  communication  via  system  resources, 
we  rely  on  Java’s  security  monitor.  Our  sandbox  also  in¬ 
cludes  many  other  restrictions  used  by  Google  App  En¬ 
gine,  e.g.,  disallowing  reflection  and  controlling  access 
to  JVM-wide  resources  such  as  system  properties. 
Authentication.  When  an  app  on  the  user’s  device 
wants  to  communicate  with  its  cloud-based  half,  it  sends 
an  “intent”  (a  high-level  IPC  mechanism  in  Android)  to 
7tBox’s  local  trusted  authentication  service,  implemented 
as  a  system  app.  After  identifying  the  requesting  app,  the 
authentication  service  requests  the  user’s  credentials  via 
user  input  or  from  a  cache  and  sends  them,  along  with  the 
app’s  ID,  through  a  TLS  tunnel  to  7tBox’s  authentication 
manager  in  the  cloud.  Upon  successful  authentication, 
the  authentication  manager  sets  up  a  new  servlet  instance 
at  a  specific  URL,  establishes  an  IPsec  endpoint  on  the 
machine  where  the  servlet  is  instantiated,  and  sends  this 

5This  implies  that  an  app  can  only  write  to  directories  that  it  alone 
has  read  access  to  and  that  other  apps  cannot  see  the  files  it  has  written. 


URL,  a  one-time  password  that  is  required  to  access  the 
servlet  instance,  and  the  IPsec  key  to  the  authentication 
service  on  the  user’s  device. 

The  authentication  service  establishes  the  other  end  of 
the  IPsec  tunnel  on  the  device,  updates  iptables  to  al¬ 
low  the  app  to  communicate  with  the  servlet,  and  passes, 
via  intent,  the  URL  and  password  to  the  app.  IPsec  en¬ 
sures  that  all  communication  to  and  from  the  servlet  is 
encrypted,  and  iptables  ensure  that  the  app  on  the  user’s 
device  can  only  communicate  with  the  user’s  servlet  in¬ 
stance  via  this  IPsec  tunnel.  Finally,  the  app  running  lo¬ 
cally  on  the  user’s  device  authenticates  using  the  pro¬ 
vided  password  via  HTTP  basic  authentication  over  the 
IPsec  tunnel  (which  encrypts  the  credentials);  this  step 
ensures  that  only  this  specific  app  can  communicate  with 
the  servlet.  Once  this  process  is  complete,  the  app  can 
send  HTTP  requests  to  the  provided  URL  and  receive 
HTTP  responses  from  its  cloud  component. 

4.2  Storage  and  communication  channels 

7tBox’s  storage  systems  use  local  device  storage  and 
HBase,  a  popular  NoSQL  storage  system.  Local  device 
storage  is  part  of  7rBox’s  private  vault.  Any  data  that  is 
written  to  local  storage  is  secured  as  described  in  Sec¬ 
tion  4.1  and  cannot  be  exported  from  the  sandbox.  Ac¬ 
cess  to  cloud  storage  is  provided  via  a  HBase-like  API. 

When  an  app  publisher  submits  an  app  to  the  plat¬ 
form,  the  publisher  provides  a  WAR  (Web  application 
ARchive)  file  that  contains  the  app’s  servlet  code  and 
XML  files  that  describe  the  schemas  of  the  HBase  ta¬ 
bles  that  the  app  needs  for  each  type  of  cloud  storage. 
To  implement  various  channels,  7tBox  provides  wrap¬ 
pers  of  the  HBase  client  that  expose  the  appropriate  in¬ 
terfaces  to  servlet  instances.  For  example,  the  interface 
to  content  storage  exposes  read-only  operations  on  the 
storage’s  shared  tables.  The  interface  to  the  cloud-backed 
private  vault  provides  both  read  and  write  access  to  the 
per-sandbox  table.  The  wrapper  for  the  aggregate  chan¬ 
nel  exposes  an  update-only  interface  for  the  counters, 
which  are  stored  in  the  HBase  tables  by  7rBox.  Stored 
counter  values  are  periodically  released  by  (1)  sanitizing 
them  via  the  differential  privacy  module  using  the  param¬ 
eters  provided  by  the  app  publisher  (Section  3.2)  and  (2) 
writing  them  to  a  table  that  can  be  read  by  the  publisher. 

The  per-sandbox  inbox  allows  a  user’s  servlet  to  re¬ 
ceive  messages  from  the  app  publisher  or  from  another 
user’s  servlet  for  the  same  app.  This  inbox  is  imple¬ 
mented  using  an  HBase  table  in  which  each  row  cor¬ 
responds  to  a  single  message.  The  row  includes  the 
sender’s  platform  username  (the  name  used  to  authen¬ 
ticate  with  the  authentication  service  or  a  special  user- 
name  reserved  for  the  app’s  publisher),  a  timestamp,  and 
the  message  body.  Messages  from  the  publisher  are  de- 
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FIGURE  5 — Overhead  of  user  isolation  for  various  workloads. 


FIGURE  4 — Latency  vs.  throughput  for  7rBox  mechanisms. 

livered  to  the  recipient’s  inbox  by  a  designated  servlet, 
which  can  be  invoked  only  by  the  authorized  publisher. 

Lastly,  when  an  app  wants  to  share  content  through 
the  sharing  channel,  it  sends  an  intent,  along  with  the 
content  to  be  shared,  to  7tBox’s  sharing  service,  which 
is  implemented  as  part  of  the  authentication  service.  The 
sharing  service  prompts  the  user  for  the  recipients’  user- 
names  and  sends  the  message,  along  with  the  usernames 
of  the  sender  and  the  recipients,  to  a  designated  servlet 
that  only  the  platform  can  access.  This  servlet  then  adds 
the  message  to  the  inbox  of  each  recipient. 
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FIGURE  6 — Fraction  of  time  the  true  top- A'  documents  appear 
in  the  noisy  top- A'  list. 


5  Evaluation 

5.1  Performance  overhead 

We  evaluate  7tBox  using  a  server  with  two  four-core 
Xeon  E5430  CPUs  and  16  GB  RAM  and  4  clients  with 
a  single-core  3  GHz  Pentium  4  Xeon  CPU  with  hyper¬ 
threading  and  1  GB  of  RAM,  all  running  Fedora  8. 

We  first  use  micro-benchmarks  to  measure  the 
throughput  and  response  time  of  the  various  mechanisms 
employed  by  7tBox  on  two  types  of  workloads:  a  simple 
static  workload  where  the  server  responds  with  about  10 
bytes  of  static  HTTP  body  data,  and  a  computationally 
intensive  workload  where  the  server  randomly  generates 
1  MB  of  data  and  calculates  its  SHA-256  hash.  We  gener¬ 
ate  the  workloads  by  having  a  varying  number  of  clients 
continuously  submit  requests  over  a  30-second  interval. 

Figure  4  shows  the  results  with  different  components 
turned  on.  In  the  base  configuration,  we  run  the  server 
with  the  Java  security  monitor  disabled,  no  isolation 
(i.e.,  a  single  servlet  instance  serves  all  client  requests), 
and  without  an  IPsec  tunnel  between  the  server  and  the 
clients.  We  then  enable  the  security  monitor,  run  multiple 
servlet  instances  to  serve  different  clients,  and/or  enable 
IPsec.  For  the  simple  static  workload,  7rBox  reduces  the 
throughput  of  the  system  by  roughly  50%,  incurring  an 
overhead  of  0.17  ms  per  operation.  For  the  heavier  SHA- 
256  workload,  however,  the  computation  required  to  gen¬ 
erate  the  hash  effectively  hides  the  overhead  of  7tBox. 

To  measure  the  overhead  of  isolating  app  instances, 
we  fix  the  load  offered  to  the  server  (i.e.,  the  number 


of  requests  generated  by  the  clients)  and  vary  the  num¬ 
ber  of  Web  app  containers  (i.e.,  per-client  servlet  con¬ 
texts)  on  the  server.  Figure  5  shows  the  throughput  and 
response  time  of  7tBox  for  three  types  of  workloads,  with 
requests  uniformly  distributed  across  the  containers.  The 
static  and  SHA-256  workloads  are  the  same  as  in  the  pre¬ 
vious  experiment.  In  the  news  reader  workload,  clients 
request  a  list  of  new  articles  (about  300)  and  a  specific 
article  (5  to  10  KB)  from  the  servlet  half  of  our  news 
reader  app  (Section  5.3).  This  causes  many  I/O-intensive 
operations  on  the  small  HBase  instance  that  stores  the  ar¬ 
ticles.  As  Figure  5  shows,  the  overhead  of  user  isolation 
is  insignificant  for  all  three  workload  types. 

5.2  Privacy  vs.  accuracy 

To  show  that  the  differential  privacy  mechanisms  em¬ 
ployed  by  7tBox  provide  reasonable  accuracy  in  real- 
world  scenarios,  we  first  apply  the  top- A'  mechanism 
to  the  60-day  Web  server  trace  of  the  1998  World  Cup 
website  [59].  For  each  day,  we  calculate  the  top  5  and 
top  10  most  frequently  accessed  documents  and  use  the 
7tBox’s  aggregate  channel  to  output  “noisy”  top-5  and 
top-10  lists.  The  total  number  of  daily  accesses  for  a  top- 
10  document  ranged  from  6,000  to  14,000. 

Figure  6  shows,  as  a  function  of  the  privacy  parameter 
e,  how  often  the  true  top-5  and  top- 10  documents  on  a 
particular  day  appeared  in  the  noisy,  privacy-preserving 
top-5  and  top- 10  lists  output  by  the  aggregate  channel. 
As  e  increases,  the  accuracy  of  the  noisy  rank  lists  im¬ 
proves.  For  example,  the  8th-ranked  item  appears  in  the 
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FIGURE  7 — Average  noisy  rank  for  a  given  true  rank. 
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FIGURE  9 — Interactions  and  data  flow  between  the  news  reader 
app  and  7tBox.  The  dark  (solid,  dotted)  lines  represent  the  flow 
from  the  (content,  ad)  publisher.  The  lighter  lines  represent  the 
same  flows  for  another  user  of  the  same  app. 


FIGURE  8 — Accuracy  of  delayed-output  counter  on  two  differ¬ 
ent  documents.  We  use  e  =  1,  \L\  =  100,  and  b  =  500. 

noisy  top-10  list  75%  of  the  time  when  e  =  0.05,  but 
95%  of  the  time  when  e  =  0.2.  This  percentage  is  even 
higher  for  items  with  higher  true  ranks.  Figure  7  shows 
the  average  noisy  rank  given  to  true  top-5  and  top- 10 
documents.  The  accuracy  of  the  noisy  rank  improves 
with  higher  e;  with  e  =  0.2,  all  ranks  are  correct. 

To  illustrate  the  advantages  of  the  delayed-output 
mechanism  for  releasing  infrequently  updated  counters, 
we  use  a  trace  from  the  University  of  Saskatchewan  Web 
server  [49]  which  contains  a  variety  of  access  patterns. 
For  this  experiment,  we  set  e  =  1,  the  total  number  of 
delayed-output  counters  (|L|)  to  100,  the  buffer  size  ( b ) 
to  500,  and  the  release  frequency  to  1  week.  We  com¬ 
pare  the  delayed-output  counter  to  a  basic  counter  that 
simply  outputs  its  differentially  private  value  every  week. 
Figure  8  shows  the  values  of  the  delayed-output  counter 
and  the  basic  counter  over  a  30-week  span  for  two  doc¬ 
uments  with  different  access  patterns.  For  the  frequently 
accessed  document,  the  delayed-output  counter  is  off  by 
12.9%  on  average  vs.  19.6%  for  the  basic  counter.  For  the 
less  frequently  accessed  document,  the  delayed-output 
counter  is  much  more  accurate,  with  a  relative  error  of 
15.6%  vs.  83.1%  for  the  basic  counter. 

5.3  Apps 

To  illustrate  how  to  build  useful  privacy-preserving  apps 
in  7tBox,  we  developed  three  sample  apps  and  ported  two 
existing  open-source  apps. 

Password  manager.  A  password  manager  is  an  exam¬ 
ple  of  an  app  that  needs  to  keep  (but  not  share)  sensitive 


data,  e.g.,  store  a  user’s  credentials  in  the  cloud  so  that 
the  user  can  access  them  from  different  devices  and  to 
avoid  keeping  them  on  the  devices  themselves.  Although 
many  such  apps  use  encryption,  the  user  must  trust  that 
the  app’s  publisher  is  neither  malicious  nor  incompetent. 

Our  7rBox-based  password  manager  app  simply  stores 
the  user’s  passwords  in  its  cloud-backed  private  vault,  en¬ 
abling  their  retrieval  from  multiple  devices.  Despite  its 
simple  design,  the  app  guarantees  that  (1)  only  a  specific 
user  can  access  the  stored  password  via  the  app,  and  (2) 
the  app  cannot  leak  the  stored  passwords  to  anyone  else 
(i.e.,  this  app  is  “green”;  see  Section  2.5).  This  benefits 
both  the  user,  who  does  not  have  to  worry  about  the  trust¬ 
worthiness  of  the  app,  and  the  app  publisher,  who  can 
rely  on  7rBox  to  secure  the  publisher’s  app’s  storage. 

News  reader.  Our  news  reader  app  is  an  example  of 
an  ad-supported  media  browsing  and  consumption  app 
that  uses  7tBox’s  storage  systems  and  involves  multiple 
publishers.  Figure  9  shows  the  flow  of  data  between  the 
publishers,  the  app,  and  the  platform. 

The  main  functionality  in  any  news  reader  app  is  dis¬ 
playing  content  (news  articles)  to  the  user.  In  our  imple¬ 
mentation,  the  publisher  supplies  the  articles  by  adding 
to,  updating,  and  removing  from  the  app’s  content  stor¬ 
age  located  on  7tBox’s  cloud  platform  (Figure  9,  la).  The 
app  has  read-only  access  to  this  storage  (Figure  9,  2a). 

The  news  reader  may  provide  personalized  content  to 
the  user,  for  example,  recommend  certain  articles  based 
on  the  user’s  reading  history.  It  can  track  the  user’s  read¬ 
ing  history  by  writing  to  its  private  vault  (Figure  9,  3a). 
Because  the  vault  is  per-user  and  per-app,  this  data  can¬ 
not  leak  to  other  app  instances  or  the  app  publisher. 

Many  apps  of  this  type  are  ad-supported.  The  ads  may 
be  published  by  either  the  app  publisher  or  a  separate  en- 
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tity,  e.g.,  an  advertising  network  partnering  with  the  app 
publisher.  Like  the  news  articles,  the  ads  are  published 
and  updated  by  their  publisher  and  viewed  by  the  user  via 
content  storage  (Figure  9,  lb  and  2b).  Any  personaliza¬ 
tion  and  micro-targeting  is  done  by  writing  the  relevant 
data  to  the  private  vault  (Figure  9,  3b). 

Both  news  and  ad  publishers  may  want  to  know  how 
often  their  articles  and  ads  have  been  viewed.  Our  app 
keeps  one  counter  per  article  and  ad.  We  use  a  top- 10 
list  to  track  the  most  popular  articles  (Figure  9,  5a)  and 
delayed-output  counters  for  ad  impressions  (Figure  9, 
5b),  since  the  latter  do  not  need  to  be  released  frequently. 

The  news  reader  app  is  a  “yellow”  app:  although  it  ex¬ 
ports  statistics,  7tBox  provides  differential  privacy  guar¬ 
antees  to  its  users.  It  is  straightforward  to  extend  the  news 
reader  to  let  users  share  interesting  articles  with  other 
users,  which  would  make  the  app  “red.” 

Transcription.  Our  transcription  app  uses  cloud- 
based  voice  recognition.  It  records  the  user’s  speech  on 
the  device  and  transmits  it  to  a  servlet,  which  writes  the 
recording  to  per-sandbox  temporary  scratch  space  and 
executes  Sphinx-4  [53],  an  open-source  speech  recog¬ 
nition  toolkit,  to  transcribe  the  text.  The  transcription  is 
then  sent  back  to  and  displayed  by  the  app  on  the  device. 
Our  current  prototype  keeps  the  dictionary  in  the  app’s 
binary  but  we  could  also  use  content  storage  for  this  pur¬ 
pose,  allowing  the  publisher  to  update  the  dictionary. 

This  app  uses  the  aggregate  channel  to  release  the 
confidence  scores  of  speech  recognition  for  each  (-gram 
((  =  1  in  our  prototype).  First,  the  app  publisher  defines 
counters  for  all  words  in  the  Sphinx-4  dictionary  (per  Ta¬ 
ble  2,  L  is  the  list  of  these  counters,  n  =  |D|).  Sphinx-4 
provides  confidence  scores  that  range  from  0  (low  confi¬ 
dence)  to  1.  Because  the  publisher  is  likely  interested  in 
the  most  misrecognized  words,  our  app  inverts  the  score 
(thus  making  higher  scores  reflect  lower  confidence,  up 
to  a  maximum  of  s  =  1)  before  adding  it  to  the  previous 
value  of  the  counter.  The  top -K  list  thus  contains  the  K 
(10  in  our  prototype)  most  misrecognized  words. 

The  transcription  app  is  a  “yellow”  app.  7rBox  guar¬ 
antees  that,  even  if  the  recordings  of  the  user’s  speech 
contain  highly  sensitive  data,  the  app  can  leak  this  data 
only  through  the  differentially  private  aggregate  channel 
as  (noisy)  top- A'  word  lists,  which  do  not  identify  the 
actual  words  spoken  by  specific  users. 

Porting  existing  apps.  We  ported  OsmAnd  [41],  an 
Android  navigation  app  based  on  OpenStreetMap  [40], 
and  ServeStream  [51],  an  HTTP-streaming  media  player 
and  media  server  browser,  to  7rBox. 

The  major  changes  to  the  apps  involved  (1)  adding 
code  to  initiate  authentication  via  7tBox’s  authentication 
service,  (2)  modifying  all  HTTP  requests  app  to  include 
the  authentication  credentials  provided  by  the  authentica¬ 
tion  service  (Section  4.1),  and  (3)  moving  map  and  media 
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FIGURE  1 0 — Number  of  top- 10  apps  in  Google  Play  categories 
(as  of  Feb.  2013)  that  can  be  supported  by  7tBox.  Unsupported 
apps  are  uncolored/white.  Stripes  represent  apps  that,  due  to 
non-core  sharing  or  unsupported  functionality,  are  one  color  but 
whose  core  functionality  is  another  color,  e.g.,  a  PDF  viewer 
that  allows  sharing  is  red,  but  its  core  is  green. 

content  into  7tBox’s  content  storage  and  serving  them  via 
servlets.  The  use  of  HTTP  as  the  communication  proto¬ 
col  simplified  porting  these  apps  to  ttBox,  but  this  sim¬ 
plification  is  likely  to  apply  to  many  other  apps.  Overall, 
for  OsmAnd,  we  modified  or  added  174  out  of  119,147 
lines  of  code;  for  ServeStream,  133  out  of  13,193  lines. 

Both  ported  apps  use  only  the  private  vault  and  content 
storage,  making  them  “green.” 


5.4  Coverage  of  existing  apps 

To  further  evaluate  how  well  7rBox  can  support  existing 
app  functionalities,  we  surveyed  the  top  10  free  apps  and 
top  10  paid  apps  from  all  categories  excluding  wallpaper, 
widget,  and  library  in  the  Google  Play  app  store,  for  a  to¬ 
tal  of  30  categories  and  600  apps.  This  survey  was  based 
solely  on  the  developer’s  description  of  the  app  in  Google 
Play,  thus  the  reported  numbers  are  only  estimates. 

Figure  10  shows  how  many  apps  can  be  supported  by 
7tBox  and  the  degree  of  support.  Among  the  paid  apps, 
46%  are  green,  18%  red,  and  36%  unsupported;  consid¬ 
ering  only  core  functionality,  74%  are  green.  Among  the 
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free  apps,  37%  are  yellow,  21%  red,  and  42%  unsup¬ 
ported;  considering  only  core  functionality,  67%  are  yel¬ 
low.  Unsurprisingly,  many  of  the  unsupported  apps  are 
those  that  are  categorized  as  “communication”  or  “so¬ 
cial”  and  thus  require  frequent  sharing  of  data.  Free  apps 
are  largely  ad-supported  and  thus  at  least  yellow. 

6  Related  work 

xBook  [52]  and  the  system  of  Viswanath  et  al.  [57]  em¬ 
ploy  an  extended  sandbox  mechanism  similar  to  7tBox 
for  social-networking  services.  These  systems  protect 
user  information  stored  on  the  platform  (e.g.,  users’  pro¬ 
files  and  social  relationships).  Hails  [21]  protects  user 
data  on  the  platform  using  language-level  information 
flow  control.  Unlike  7tBox,  none  of  these  systems  can 
protect  private  information  that  apps  directly  receive  or 
infer  from  their  interactions  with  the  users. 

In  xBook,  each  user  decides  whether  to  allow  a  partic¬ 
ular  domain  to  access  a  given  part  of  the  user’s  profile. 
By  contrast,  7tBox  simplifies  users’  decision-making  by 
color-coding  the  apps  based  on  their  potential  for  privacy 
violations.  xBook  anonymizes  app  statistics  (with  no  for¬ 
mal  privacy  guarantees),  while  the  system  of  Viswanath 
et  al.  uses  conventional  differential  privacy.  As  we  show 
in  Section  5.2,  this  can  lead  to  high  relative  errors  when 
releasing  rarely  updated  values. 

Embassies  [26]  is  somewhat  similar  to  7tBox  in  that 
it  aims  to  secure  apps  through  a  minimal  interface  that 
allows  most  apps  to  function  correctly.  Unlike  in  7tBox, 
app  publishers  are  not  viewed  as  adversaries  with  respect 
to  the  user  data  collected  by  the  app. 

Dynamic  taint  analysis  tracks  the  flow  of  sensitive 
data  through  program  binaries  [10,  24,  62]  and  can 
help  protect  user  privacy.  For  example,  TaintDroid  [18] 
detects  (rather  than  prevents)  privacy  violations,  while 
AppFence  [25]  uses  data  shadowing  and  exfiltration 
blocking  to  prevent  tainted  data  from  leaving  the  device. 
Neither  system  handles  implicit  leaks.  While  taint-based 
systems  can  track  specific  data  items  such  as  device  ID, 
they  cannot  prevent  the  app  from  leaking  information 
about  the  user’s  behavior  (e.g.,  articles  the  user  has  read). 
In  general,  dynamic  taint  tracking  is  complementary  to 
the  guarantees  provided  by  7tBox.  For  example,  it  can  be 
used  to  prevent  certain  data  items  from  being  declassified 
even  via  differentially  private  channels. 

Bubbles  [55]  aims  to  capture  privacy  intentions  by 
clustering  data  into  “bubbles”  based  on  explicit  user  be¬ 
havior.  The  privacy  guarantee  is  similar  to  that  of  7rBox’s 
sharing  channel:  once  the  user  adds  a  friend  to  a  bubble, 
this  friend  gains  access  to  all  data  in  that  bubble.  Bubbles 
is  limited  to  apps  that  run  on  the  client  device  only. 

ObliviAd  [5]  and  PrivAd  [22]  are  privacy-preserving 
online  advertising  systems  that  aim  to  protect  user  pro¬ 


files  from  ad  brokers.  ObliviAd  creates  a  black  box  at 
the  ad  broker  using  a  secure  coprocessor  and  oblivious 
RAM.  This  black  box  serves  ads  to  clients,  receives  re¬ 
ports  about  ad  clicks  and  impressions  from  clients  via 
a  secure  TFS  channel,  records  which  ads  were  clicked 
or  viewed  (but  not  who  viewed  an  ad),  and  only  releases 
these  records  in  large  batches  to  make  it  difficult  to  deter¬ 
mine  who  saw  which  ad.  In  PrivAd,  clients  fetch  a  large 
set  of  ads  that  are  roughly  based  on  users’  interests;  more 
accurate  targeting  is  done  only  at  the  client.  When  the 
client  reports  which  ads  have  been  shown,  a  trusted  third 
party  anonymizes  his  identity  before  sending  the  data  to 
the  ad  broker.  By  contrast,  7rBox  aims  to  provide  rigor¬ 
ous  privacy  guarantees  without  sacrificing  the  ability  of 
advertisers  to  obtain  accurate  impression  counts. 

PINQ  [35]  and  Airavat  [48]  are  centralized  platforms 
for  differentially  private  computations  on  static  datasets. 
PDDP  [9]  is  a  distributed  differential  privacy  system  in 
which  participants  maintain  their  own  data. 

While  the  cloud  provider  is  trusted  in  7tBox,  Cloud- 
Visor  [61]  and  CryptDB  [44]  focus  on  untrusted 
clouds.  CloudVisor  hides  users’  data  from  the  hypervi¬ 
sor  using  nested  virtualization,  CryptDB  uses  encryp¬ 
tion.  CFAMP  [42]  employs  isolation  and  authentication 
mechanisms  that  are  similar  to  7tBox  to  protect  private 
data  in  FAMP-like  Web  servers.  It  focuses  on  compro¬ 
mised  servers  rather  than  malicious  applications. 

7tBox  can  be  viewed  as  imposing  a  mandatory  infor¬ 
mation  flow  policy  on  untrusted  apps.  Previous  work  on 
information  flow  control  includes  [12,  32,  38,  60]  and 
hundreds  of  other  papers. 

Bring-Your-Own-Device  approaches  that  support  dual 
workspaces  [3,  58]  enable  personal  and  corporate  data  to 
coexist  on  the  same  device  while  permitting  only  trusted 
apps  to  access  the  corporate  data.  7rBox  takes  this  idea  a 
step  further  and  allows  untrusted  apps  to  run  on  corporate 
data,  thus  realizing  the  idea  of  Bring- Your-Own-App. 

7  Conclusion 

7tBox  is  a  new  app  platform  that  combines  support  for 
apps’  functional  needs  with  rigorous  privacy  protec¬ 
tion  for  their  users.  Our  evaluation  demonstrates  that 
7tBox  can  be  used  in  many  practical  scenarios,  includ¬ 
ing  “bring-your-own-app"  enterprise  deployments  where 
external  apps  operate  on  proprietary  company  data. 
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